Analysis of the Quotation Corpus of the Russian Wiktionary
نویسندگان
چکیده
The quantitative evaluation of quotations in the Russian Wiktionary was performed using the developed Wiktionary parser. It was found that the number of quotations in the dictionary is growing fast (51.5 thousands in 2011, 62 thousands in 2012). These quotations were extracted and saved in the relational database of a machine-readable dictionary. For this database, tables related to the quotations were designed. A histogram of distribution of quotations of literary works written in different years was built. It was made an attempt to explain the characteristics of the histogram by associating it with the years of the most popular and cited (in the Russian Wiktionary) writers of the nineteenth century. It was found that more than one-third of all the quotations (the example sentences) contained in the Russian Wiktionary are taken by the editors of a Wiktionary entry from the Russian National Corpus.
منابع مشابه
The comparison of Wiktionary thesauri transformed into the machine-readable format
Institution of the Russian Academy of Sciences St.Petersburg Institute for Informatics and Automation RAS Phone: +7 (812) 328-80-71 Fax: +7 (812) 328-44-50 andrew dot [email protected] http://code.google.com/p/wikokit/ Wiktionary is a unique, peculiar, valuable and original resource for natural language processing (NLP). The paper describes an open-source Wiktionary parser: its architectur...
متن کاملDictionary of Abstract and Concrete Words of the Russian Language: A Methodology for Creation and Application
The paper describes the first stage of a project on creating an electronic dictionary with numerical estimates of the degree of abstractness and concreteness of Russian words. Our approach is to integrate data obtained from several different sources: text corpora, psycholinguistic experiments, published dictionaries, markers of abstractness (certain suffixes) and a translation of a similar dict...
متن کاملRelated terms search based on WordNet / Wiktionary and its application in Ontology Matching
A set of ontology matching algorithms (for finding correspondences between concepts) is based on a thesaurus that provides the source data for the semantic distance calculations. In this wiki era, new resources may spring up and improve this kind of semantic search. In the paper a solution of this task based on Russian Wiktionary is compared to WordNet based algorithms. Metrics are estimated us...
متن کاملAnalysis of the Impact of Economic Sanctions on Health Research and Publication Activities of Scientists from Iran
The article discusses the publication activity of scientists in the field of studying the consequences of US economic sanctions against Iran, and their impact on the development of science and the economy in this countries. The paper considers the dynamics of publication activity in the field of biomedicine of Iranian scientists over the past 20 years. Increased sanctions have led to a shortage...
متن کاملTransformation of Wiktionary entry structure into tables and relations in a relational database schema
This paper addresses the question of automatic data extraction from the Wiktionary, which is a multilingual and multifunctional dictionary. Wiktionary is a collaborative project working on the same principles as the Wikipedia. The Wiktionary entry is a plain text from the text processing point of view. Wiktionary guidelines prescribe the entry layout and rules, which should be followed by edito...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Research in Computing Science
دوره 56 شماره
صفحات -
تاریخ انتشار 2012